59 research outputs found

    Predicting protein-protein binding sites in membrane proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many integral membrane proteins, like their non-membrane counterparts, form either transient or permanent multi-subunit complexes in order to carry out their biochemical function. Computational methods that provide structural details of these interactions are needed since, despite their importance, relatively few structures of membrane protein complexes are available.</p> <p>Results</p> <p>We present a method for predicting which residues are in protein-protein binding sites within the transmembrane regions of membrane proteins. The method uses a Random Forest classifier trained on residue type distributions and evolutionary conservation for individual surface residues, followed by spatial averaging of the residue scores. The prediction accuracy achieved for membrane proteins is comparable to that for non-membrane proteins. Also, like previous results for non-membrane proteins, the accuracy is significantly higher for residues distant from the binding site boundary. Furthermore, a predictor trained on non-membrane proteins was found to yield poor accuracy on membrane proteins, as expected from the different distribution of surface residue types between the two classes of proteins. Thus, although the same procedure can be used to predict binding sites in membrane and non-membrane proteins, separate predictors trained on each class of proteins are required. Finally, the contribution of each residue property to the overall prediction accuracy is analyzed and prediction examples are discussed.</p> <p>Conclusion</p> <p>Given a membrane protein structure and a multiple alignment of related sequences, the presented method gives a prioritized list of which surface residues participate in intramembrane protein-protein interactions. The method has potential applications in guiding the experimental verification of membrane protein interactions, structure-based drug discovery, and also in constraining the search space for computational methods, such as protein docking or threading, that predict membrane protein complex structures.</p

    Towards Universal Structure-Based Prediction of Class II MHC Epitopes for Diverse Allotypes

    Get PDF
    The binding of peptide fragments of antigens to class II MHC proteins is a crucial step in initiating a helper T cell immune response. The discovery of these peptide epitopes is important for understanding the normal immune response and its misregulation in autoimmunity and allergies and also for vaccine design. In spite of their biomedical importance, the high diversity of class II MHC proteins combined with the large number of possible peptide sequences make comprehensive experimental determination of epitopes for all MHC allotypes infeasible. Computational methods can address this need by predicting epitopes for a particular MHC allotype. We present a structure-based method for predicting class II epitopes that combines molecular mechanics docking of a fully flexible peptide into the MHC binding cleft followed by binding affinity prediction using a machine learning classifier trained on interaction energy components calculated from the docking solution. Although the primary advantage of structure-based prediction methods over the commonly employed sequence-based methods is their applicability to essentially any MHC allotype, this has not yet been convincingly demonstrated. In order to test the transferability of the prediction method to different MHC proteins, we trained the scoring method on binding data for DRB1*0101 and used it to make predictions for multiple MHC allotypes with distinct peptide binding specificities including representatives from the other human class II MHC loci, HLA-DP and HLA-DQ, as well as for two murine allotypes. The results showed that the prediction method was able to achieve significant discrimination between epitope and non-epitope peptides for all MHC allotypes examined, based on AUC values in the range 0.632–0.821. We also discuss how accounting for peptide binding in multiple registers to class II MHC largely explains the systematically worse performance of prediction methods for class II MHC compared with those for class I MHC based on quantitative prediction performance estimates for peptide binding to class II MHC in a fixed register

    Learning a peptide-protein binding affinity predictor with kernel ridge regression

    Get PDF
    We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development.Comment: 22 pages, 4 figures, 5 table

    Predicting Peptide Binding Affinities to MHC Molecules Using a Modified Semi-Empirical Scoring Function

    Get PDF
    The Major Histocompatibility Complex (MHC) plays an important role in the human immune system. The MHC is involved in the antigen presentation system assisting T cells to identify foreign or pathogenic proteins. However, an MHC molecule binding a self-peptide may incorrectly trigger an immune response and cause an autoimmune disease, such as multiple sclerosis. Understanding the molecular mechanism of this process will greatly assist in determining the aetiology of various diseases and in the design of effective drugs. In the present study, we have used the Fresno semi-empirical scoring function and modify the approach to the prediction of peptide-MHC binding by using open-source and public domain software. We apply the method to HLA class II alleles DR15, DR1, and DR4, and the HLA class I allele HLA A2. Our analysis shows that using a large set of binding data and multiple crystal structures improves the predictive capability of the method. The performance of the method is also shown to be correlated to the structural similarity of the crystal structures used. We have exposed some of the obstacles faced by structure-based prediction methods and proposed possible solutions to those obstacles. It is envisaged that these obstacles need to be addressed before the performance of structure-based methods can be on par with the sequence-based methods

    Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.</p> <p>Result</p> <p>In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.</p> <p>Conclusion</p> <p>Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.</p

    Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored.</p> <p>Results</p> <p>We propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process.</p> <p>Conclusions</p> <p>The proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.</p

    MultiRTA: A simple yet reliable method for predicting peptide binding affinities for multiple class II MHC allotypes

    Get PDF
    abstract: Background The binding of peptide fragments of antigens to class II MHC is a crucial step in initiating a helper T cell immune response. The identification of such peptide epitopes has potential applications in vaccine design and in better understanding autoimmune diseases and allergies. However, comprehensive experimental determination of peptide-MHC binding affinities is infeasible due to MHC diversity and the large number of possible peptide sequences. Computational methods trained on the limited experimental binding data can address this challenge. We present the MultiRTA method, an extension of our previous single-type RTA prediction method, which allows the prediction of peptide binding affinities for multiple MHC allotypes not used to train the model. Thus predictions can be made for many MHC allotypes for which experimental binding data is unavailable. Results We fit MultiRTA models for both HLA-DR and HLA-DP using large experimental binding data sets. The performance in predicting binding affinities for novel MHC allotypes, not in the training set, was tested in two different ways. First, we performed leave-one-allele-out cross-validation, in which predictions are made for one allotype using a model fit to binding data for the remaining MHC allotypes. Comparison of the HLA-DR results with those of two other prediction methods applied to the same data sets showed that MultiRTA achieved performance comparable to NetMHCIIpan and better than the earlier TEPITOPE method. We also directly tested model transferability by making leave-one-allele-out predictions for additional experimentally characterized sets of overlapping peptide epitopes binding to multiple MHC allotypes. In addition, we determined the applicability of prediction methods like MultiRTA to other MHC allotypes by examining the degree of MHC variation accounted for in the training set. An examination of predictions for the promiscuous binding CLIP peptide revealed variations in binding affinity among alleles as well as potentially distinct binding registers for HLA-DR and HLA-DP. Finally, we analyzed the optimal MultiRTA parameters to discover the most important peptide residues for promiscuous and allele-specific binding to HLA-DR and HLA-DP allotypes. Conclusions The MultiRTA method yields competitive performance but with a significantly simpler and physically interpretable model compared with previous prediction methods. A MultiRTA prediction webserver is available at http://bordnerlab.org/MultiRTA.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-48

    PREDIVAC: CD4+T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity

    Get PDF
    Background: CD4+ T-cell epitopes play a crucial role in eliciting vigorous protective immune responses during peptide (epitope)-based vaccination. The prediction of these epitopes focuses on the peptide binding process by MHC class II proteins. The ability to account for MHC class II polymorphism is critical for epitope-based vaccine design tools, as different allelic variants can have different peptide repertoires. In addition, the specificity of CD4+ T-cells is often directed to a very limited set of immunodominant peptides in pathogen proteins. The ability to predict what epitopes are most likely to dominate an immune response remains a challenge

    A mathematical and computational review of Hartree-Fock SCF methods in Quantum Chemistry

    Get PDF
    We present here a review of the fundamental topics of Hartree-Fock theory in Quantum Chemistry. From the molecular Hamiltonian, using and discussing the Born-Oppenheimer approximation, we arrive to the Hartree and Hartree-Fock equations for the electronic problem. Special emphasis is placed in the most relevant mathematical aspects of the theoretical derivation of the final equations, as well as in the results regarding the existence and uniqueness of their solutions. All Hartree-Fock versions with different spin restrictions are systematically extracted from the general case, thus providing a unifying framework. Then, the discretization of the one-electron orbitals space is reviewed and the Roothaan-Hall formalism introduced. This leads to a exposition of the basic underlying concepts related to the construction and selection of Gaussian basis sets, focusing in algorithmic efficiency issues. Finally, we close the review with a section in which the most relevant modern developments (specially those related to the design of linear-scaling methods) are commented and linked to the issues discussed. The whole work is intentionally introductory and rather self-contained, so that it may be useful for non experts that aim to use quantum chemical methods in interdisciplinary applications. Moreover, much material that is found scattered in the literature has been put together here to facilitate comprehension and to serve as a handy reference.Comment: 64 pages, 3 figures, tMPH2e.cls style file, doublesp, mathbbol and subeqn package
    corecore